Parallelization of Saprse Cholesky Factorization on an SMP Cluster

نویسندگان

Shigehisa Satoh

Kazuhiro Kusano

Yoshio Tanaka

Motohiko Matsuda

Mitsuhisa Sato

چکیده

In this paper, we present parallel implementations of the sparse Cholesky factorization kernel in the SPLASH-2 programs to evaluate performance of a Pentium Pro based SMP cluster. Solaris threads and remote memory operations are utilized for intranode parallelism and internode communications, respectively. Sparse Cholesky factorization is a typical irregular application with a high communication to computation ratio and no global synchronization between steps. We e ciently parallelized using asynchronous message handling instead of lock-based mutual exclusion between nodes, because synchronization between nodes reduces the performance signi cantly. We also found that the mapping of processes to processors on an SMP cluster a ects the performance especially when the communication latency can not be hidden.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Experiments with Cholesky Factorization on Clusters of SMPs

Cholesky factorization of large dense matrices is an integral part of many applications in science and engineering. In this paper we report on experiments with different parallel versions of Cholesky factorization on modern high-performance computing architectures. For the parallelization of Cholesky factorization we utilized various standard linear algebra software packages and present perform...

متن کامل

Implementing a parallel matrix factorization library on the cell broadband engine

Matrix factorization (or often called decomposition) is a frequently used kernel in a large number of applications ranging from linear solvers to data clustering and machine learning. The central contribution of this paper is a thorough performance study of four popular matrix factorization techniques, namely, LU, Cholesky, QR, and SVD on the STI Cell broadband engine. The paper explores algori...

متن کامل

Cholesky Factorization of Band Matrices Using Multithreaded BLAS

In this paper we analyze the efficacy of the LAPACK blocked routine for the Cholesky factorization of symmetric positive definite band matrices on Intel SMP platforms using two multithreaded implementations of BLAS. We also propose strategies that alleviate some of the performance degradation that is observed, and which is basically due to the use of multiple threads when dealing with problems ...

متن کامل

An Algorithm-by-Blocks for SuperMatrix Band Cholesky Factorization

We pursue the scalable parallel implementation of the factorization of band matrices with medium to large bandwidth targeting SMP and multi-core architectures. Our approach decomposes the computation into a large number of fine-grained operations exposing a higher degree of parallelism. The SuperMatrix run-time system allows an out-of-order scheduling of operations that is transparent to the pr...

متن کامل

High Performance Cholesky Factorization via Blocking and Recursion That Uses Minimal Storage

We present a high performance Cholesky factorization algorithm , called BPC for Blocked Packed Cholesky, which performs better or equivalent to the LAPACK DPOTRF subroutine, but with about the same memory requirements as the LAPACK DPPTRF subroutine, which runs at level 2 BLAS speed. Algorithm BPC only calls DGEMM and level 3 kernel routines. It combines a recursive algorithm with blocking and ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1999

Parallelization of Saprse Cholesky Factorization on an SMP Cluster

نویسندگان

چکیده

منابع مشابه

Experiments with Cholesky Factorization on Clusters of SMPs

Implementing a parallel matrix factorization library on the cell broadband engine

Cholesky Factorization of Band Matrices Using Multithreaded BLAS

An Algorithm-by-Blocks for SuperMatrix Band Cholesky Factorization

High Performance Cholesky Factorization via Blocking and Recursion That Uses Minimal Storage

عنوان ژورنال:

اشتراک گذاری